Olympic Exploration
Introduction
The Olympics represent a global celebration of the most exceptional athletes worldwide, promoting the values of peace and unity. Winning a gold medal for one’s country is the ultimate honor for any athlete. This impressive accomplishment may be attributed to apparent variables, such as an athlete’s physical traits or the type of Olympics (summer or winter), as well as less apparent factors. In this blog, we investigate whether a country’s total GDP and population impact the number of medals won at the Olympics. Our goal is to enhance our comprehension of the factors that contribute to a country’s success in the games and share this information in the hope of leveling the playing field.
USA Medal Count by Type
The medal time graph above presents the number of medals that the United States has won during each Olympic year, providing insight into the country’s performance at the global athletic event. The graph displays steep curves that indicate the fluctuations in medal counts between the summer and winter Olympics. The United States tends to win more medals during the summer Olympics compared to the winter Olympics. It is essential to note the outliers of this trend, such as the 1904 Olympics held in St. Louis, MO. During these games, the majority of the athletes were Americans, resulting in the US winning a significant number of medals. This was also due to limited transportation options at the time, as boats were the primary mode of travel for most countries to reach the games. Another peak in medal count can be observed during the 1984 Summer Olympics held in Los Angeles, CA, as many communist nations (e.g., Soviet Union) boycotted the games, providing the United States with a greater opportunity to secure more medals.
USA Medal Count by Gender
The medal time graph above illustrates the historical medal count for USA Females and USA Males in the Olympics. While the graph indicates that men have won more medals than women, it is crucial to note the distinction between the two lines. Although the blue line (representing Men) is generally above the pink line (representing Women), over time, the gap between the two lines has narrowed. This indicates that in the late 1900s, the performance of men and women in terms of medaling became more similar. In contrast, during the early 1900s, American Men dominated the majority of US medals. This trend signifies the emergence of women in athletics at the international level.
2016 Olympic Data Analysis
The graph above depicts the relationship between a country’s GDP and its medal count at the 2016 Olympic Games. It is evident that a significant cluster of countries with low GDP and low medal count exists, indicating that many small countries were not large medal winners. The data suggests that only two countries, the United States and China, managed to surpass the 50 medal count threshold, which correlates with their relatively high GDP. Additionally, the size of the points on the graph corresponds to a country’s population, but it is clear that population is not as predictive of medal count as GDP. This is evident as there are large and small points at both ends of the y-axis (medal count).
GDP and Population Country Maps (2010)
GDP
This map displays Total GDP in 2010 for countries around the world. Lighter shades indicate a lower total GDP, while darker shades represent a higher GDP. Countries such as China, The United States of America, and Japan are among the countries with the highest Total GDP. Users can hover over the country of their choice to identify the country’s name and its Total GDP.
Population
This map displays populations for countries around the world in 2010. Lighter shades indicate a relatively low population, while dark shades indicate a relatively high population. China and India are among the countries with the highest total population by a considerable margin. Users can hover over the country of their choice to identify the country’s name and its population.
Map of Medals won in 2016 Olympics by Population
This map displays the medals won in the 2016 Olympics versus the population of countries around the world in 2016. Lighter shades indicate a relatively low number of medals when compared to the population, while dark shades indicate a high number of medals when compared to the population. Grey represents the countries who did not win medals during the 2016 Olympics. Georgia, Azerbaijan, and Denmark are among the countries with the highest number of medals in context of their populations. Countries with high GDPs or populations such as the United States, China, Japan, and Brazil all won medals during the Olympics. Users can hover over the countries to view the country’s name and Medals won in comparison to their population. The maps were created using ggplot in R.
Athlete Medal Analysis
Height (All Athletes)
| Cluster | Minimum (cm) | Q1 (cm) | Median (cm) | Q3 (cm) | Maximum (cm) | Mean (cm) | Standard Deviation (cm) | Count |
|---|---|---|---|---|---|---|---|---|
| 1 | 136 | 163 | 167 | 170 | 179 | 166.01 | 6.08 | 9903 |
| 2 | 164 | 187 | 190 | 195 | 223 | 191.33 | 6.66 | 6679 |
| 3 | 162 | 176 | 180 | 183 | 203 | 179.39 | 4.73 | 13599 |
Weight (All Athletes)
| Cluster | Minimum (kg) | Q1 (kg) | Median (kg) | Q3 (kg) | Maximum (kg) | Mean (kg) | Standard Deviation (kg) | Count |
|---|---|---|---|---|---|---|---|---|
| 1 | 28 | 55 | 59 | 64 | 90 | 58.80 | 6.84 | 9903 |
| 2 | 70 | 87 | 92 | 98 | 182 | 94.05 | 11.06 | 6679 |
| 3 | 54 | 70 | 75 | 79 | 104 | 74.68 | 6.29 | 13599 |
The goal here is to observe how height and weight may hold an impact on whether or not an athlete will be successful in the olympics. For the dataset containing all athletes, the 3rd cluster observed the most medaling. From this, we can conclude that the middle third of athletes by height and weight are more likely to medal than the upper and lower thirds. In a similar vein, the lowest third of athletes by height and weight are more likely to medal than the highest third. Generally speaking, athletes in the middle ground in terms of height and wight have been, historically, more successful at the Olympics than others.
Similarly, we can observe individual types of medals across these three clusters. The same trend I described before is generally true for each type of medal, with the middle ground height and weight athletes winning the most gold, silver, and bronze medals. Though, the difference between the first two clusters is much less major.
US Athlete Medal Analysis
Height (US Athletes)
| Cluster | Minimum (cm) | Q1 (cm) | Median (cm) | Q3 (cm) | Maximum (cm) | Mean (cm) | Standard Deviation (cm) | Count |
|---|---|---|---|---|---|---|---|---|
| 1 | 136 | 163 | 167 | 170 | 179 | 166.01 | 6.08 | 9903 |
| 2 | 164 | 187 | 190 | 195 | 223 | 191.33 | 6.66 | 6679 |
| 3 | 162 | 176 | 180 | 183 | 203 | 179.39 | 4.73 | 13599 |
Weight (US Athletes)
| Cluster | Minimum (kg) | Q1 (kg) | Median (kg) | Q3 (kg) | Maximum (kg) | Mean (kg) | Standard Deviation (kg) | Count |
|---|---|---|---|---|---|---|---|---|
| 1 | 28 | 55 | 59 | 64 | 90 | 58.80 | 6.84 | 9903 |
| 2 | 70 | 87 | 92 | 98 | 182 | 94.05 | 11.06 | 6679 |
| 3 | 54 | 70 | 75 | 79 | 104 | 74.68 | 6.29 | 13599 |
The same trend observed in the prior bar graph remains true here. For american athletes, the 1st cluster contains the most medaling. As before, this is the middle third of athletes clustered by height and weight.
Limiting the data to only include american athletes may give more credence to the initial observation that athletes who fall into that middle ground in terms of height and weight have historically been the most successful athletes.
These bar graphs pull from a data set containing every Olympic medal winner in the competition’s history (since data collection began). As you can see, the data is clustered. I clustered the data by height and weight, and measured the total medal count for each cluster, separated by the type of medal. The first bar graph displays all athletes, where as the second bar graph only displays American athletes.
Limitations:
First off, different sports call for different types of physical builds. For example, a power lifter competing in the Olympics needs to be big. On the contrary, a gymnast needs to be much lighter and lean, and in reality, they are often times much shorter as well. These physical traits give athletes competitive advantages in their own sports, therefore, it would be unfair to assert that one specific type of height and weight caters to the most success. Overall, that may be the case, however once you analyze these sports as individuals we see that these clusters don’t give us as definitive results.
Similarly, we can not account for those athletes who compete as individuals versus those who compete as apart of a team. A gymnast, for example, competes as an individual. These athletes only rely on themselves for success during the competition. On the other hand, a basketball athlete relies on themselves as well as their teammates. Additionally, athletes on a team potentially fill certain physical builds that complement the builds of their teammates. On a basketball team, you will have a point-guard who are typically shorter and lighter along with a center who are typically taller and heavier. So, the fact that we are unable to distinguish between team athletes and individual athletes is another limitation of our k-means clustering analysis.
Conclusion
We have chosen a quite broad blog topic. Evidently, there are so many possible factors of an athlete’s performance worth exploring, and this blog does exactly that. In the end, we focused on both national and personal attributes, as both are important in the Olympics.
It seems clear that, at the national level, economic strength plays a major role in the success of a nation at the Olympics. There are many reasons why this may be true, however, it seems that the more money a nation has, the more money they can pour into resources for building up incredible athletes. Economic power is by no means a primary factor in an athletes development, and the medal counts for nations like the United States and Great Britain are also skewed because of how many sports they participate in compared to nations with smaller populations and economies. Often times, the nations with the highest GDPs are also those nations which have been competing in a number of sports at the Olympics for well over 100 years. Needless to say, GDP is only an indication of countries who may have had the luxury of pouring money into athletics.
At the athlete level, the story of success is much more complex. The given visualizations explore a number of different physical traits. One major observation made is that the total medal numbers amongst men and women for the United States gas effectively evened out over time. Consequently, the highest earning nations win medals almost equally amongst genders.
We also analyze the impacts of more specific physical traits: height and weight. It may be true that the middle third of athletes by height and weight have been the most successful overall throughout Olympic history. Granted, this observation should only be looked at as such, as there are a number of limitations in our data.
It would be great to identify which of these factors holds a more marginal impact on success, however, at this very moment it is not clear what the most important factor is. At the very least, we see that there are a number of factors which potentially impact success at the Olympics.
References
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
Vanderkam D, Allaire J, Owen J, Gromer D, Thieurmel B (2018). dygraphs: Interface to ‘Dygraphs’ Interactive Time Series Charting Library. R package version 1.1.1.6, https://CRAN.R-project.org/package=dygraphs.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Xie Y, Cheng J, Tan X (2023). DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.27, https://CRAN.R-project.org/package=DT.
Arnold J (2021). ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’. R package version 4.2.4, https://CRAN.R-project.org/package=ggthemes.
C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.
Simon Garnier, Noam Ross, Robert Rudis, Antônio P. Camargo, Marco Sciaini, and Cédric Scherer (2021). Rvision - Colorblind-Friendly Color Maps for R. R package version 0.6.2.
R. Pruim, D. T. Kaplan and N. J. Horton. The mosaic Package: Helping Students to ‘Think with Data’ Using R (2017). The R Journal, 9(1):77-102.
Zhu H (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, https://CRAN.R-project.org/package=kableExtra.
Heesoo, K. (2017), “120 years of Olympic history: athletes and results” (Version 2), Kaggle, available at https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results.
Devakumar, K. P. (2021), “World population 1960-2018” (Version 6), Kaggle, available at https://www.kaggle.com/imdevskp/world-population-19602018.
Loong, Ho. (2021), “GDP of each country and region(1960-2020)” (Version 3), Kaggle, available at https://www.kaggle.com/holoong9291/gdp-of-all-countries19602020.